Video Feature Extraction¶

Now that we have our training and testing tensors, we can start extracting relevant features from the video data.

Importing Libraries and Data¶

In [1]:
import os
import numpy as np
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
from scipy.signal import convolve2d
import tensorly as tl
import gc
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
In [2]:
train_test_data_dir = os.path.join(os.getcwd(), 'Train_Test_Data')
train_videos = np.load(os.path.join(train_test_data_dir, 'train_videos.npy'))
test_videos = np.load(os.path.join(train_test_data_dir, 'test_videos.npy'))
y_train = np.load(os.path.join(train_test_data_dir, 'y_train.npy'))
y_test = np.load(os.path.join(train_test_data_dir, 'y_test.npy'))
id_train = np.load(os.path.join(train_test_data_dir, 'id_train.npy'))
id_test = np.load(os.path.join(train_test_data_dir, 'id_test.npy'))

print(train_videos.shape)
print(test_videos.shape)
print(y_train.shape)
print(y_test.shape)
print(id_train.shape)
print(id_test.shape)

labels = ['Neutral', 'Calm', 'Happy', 'Sad', 'Angry', 'Fearful', 'Disgust', 'Surprised']
(240, 426, 3, 50, 960)
(240, 426, 3, 50, 480)
(960,)
(480,)
(960, 7)
(480, 7)

Breakdown of Custom 3D Histogram of Oriented Gradients for Dimensionality Reduction¶

The formulation of this Histogram of Oriented Gradients algorithm is loosely based off of recent research in the field of computer vision. However, this project translates this approach for 3 dimensions. Here's an overview of the methodology that is applied to both training and testing datasets:

1. Iterate through every sample, convert the frames to grayscale, and (optionally) apply the following Gaussian filter to each frame of the video ($V$):

$$Gaussian\>Filter \> \Longrightarrow \> \frac{1}{1115}\begin{pmatrix} 1 & 4 & 7 & 10 & 7 & 4 & 1 \\ 4 & 12 & 26 & 33 & 26 & 12 & 4 \\ 7 & 26 & 55 & 71 & 55 & 26 & 7 \\ 10 & 33 & 71 & 91 & 71 & 33 & 10 \\ 7 & 26 & 55 & 71 & 55 & 26 & 7 \\ 4 & 12 & 26 & 33 & 26 & 12 & 4 \\ 1 & 4 & 7 & 10 & 7 & 4 & 1 \end{pmatrix} \\ $$

2. We then compute the gradients with respect to the height, width [1], and frames ($\frac{\partial V}{\partial x}$, $\frac{\partial V}{\partial y}$, $\frac{\partial V}{\partial z}$).

3. Using these gradients, we can compute the three dimensional gradient magnitude for each video. $$ G = \sqrt{\left( \frac{\partial V}{\partial x}\right)^2 + \left( \frac{\partial V}{\partial y} \right)^2 + \left( \frac{\partial V}{\partial z}\right)^2} $$

4. Generally for images ($I$), we compute the gradient direction by $\theta = \arctan \left( \frac{\frac{\partial I}{\partial y}}{\frac{\partial I}{\partial x}} \right)$ [2]. Since we have videos, we must compute the azimuthal angle and the polar angle to capture the 3D feature-space [4].

$$ \theta_{azimuth} = \arctan \left( \frac{\frac{\partial V}{\partial y}}{\frac{\partial V}{\partial x}} \right) \\ $$

$$ \phi_{polar} = \arctan \left( \frac{\sqrt{\left( \frac{\partial V}{\partial x}\right)^2 + \left( \frac{\partial V}{\partial y} \right)^2}}{\frac{\partial V}{\partial z}} \right) \\ $$

5. With our three sets of features, we can now partition the video into cells [3]. In our case, the cell size is ($5$, $6$, $5$), which will group sets of 180 pixels together.

6. With our grouped pixels, we cluster the gradient magnitudes of each pixel ($G_{(i, j, k)}$) into bins based on the azimuthal and polar angles. With $9$ bins, we sum the gradient magnitudes for all the pixels belonging to each bin for both types of angles, which reduces the dimensionality from $180$ points in each cell to $9*2 = 18$ points per cell. We can then save these results to disk.

Example of HOG Features (in 2D):

HOG Features

Source: https://www.sciencedirect.com/topics/computer-science/histogram-of-oriented-gradient

We follow a similar approach as outlined in the image above, but in 3D.

Here's an example of what the gradient magnitude, azimuthal angle, and polar angle looks like for the videos:¶

In [3]:
sample = 72 #random video from training data

gray_frames = tl.tenalg.mode_dot(train_videos[:, :, :, :, sample], np.array([0.2989, 0.5870, 0.1140]), mode=2)

gx = np.gradient(gray_frames, axis=0)
gy = np.gradient(gray_frames, axis=1)
gz = np.gradient(gray_frames, axis=2)

magnitude = np.sqrt(gx**2 + gy**2 + gz**2)
azimuthal_angle = np.arctan2(gy, gx)
polar_angle = np.arctan2(np.sqrt(gx**2, gy**2), gz)

del gx, gy, gz
gc.collect()

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(15, 7))
axes=axes.flatten()

im1 = axes[0].imshow(gray_frames[:, :, 0], cmap='gray')
im2 = axes[1].imshow(magnitude[:, :, 0], cmap='gray')
im3 = axes[2].imshow(azimuthal_angle[:, :, 0], cmap='gray')
im4 = axes[3].imshow(polar_angle[:, :, 0], cmap='gray')

axes[0].set_title('Grayscale Video')
axes[1].set_title('Gradient Magnitude')
axes[2].set_title('Azimuthal Angle')
axes[3].set_title('Polar Angle')

def update(i):
    im1.set_array(gray_frames[:, :, i])
    im2.set_array(magnitude[:, :, i])
    im3.set_array(azimuthal_angle[:, :, i])
    im4.set_array(polar_angle[:, :, i])
    return [im1, im2, im3]
    
plt.tight_layout()
plt.close(fig)

fig.suptitle(f'Example features for {labels[y_train[sample]-1]} emotion')
ani = FuncAnimation(fig, update, frames=gray_frames.shape[2]//2, blit=False, interval=100)
HTML(ani.to_jshtml())
Out[3]:
No description has been provided for this image

Implementing Feature Extraction and Saving Data¶

In [4]:
def calculate_total_descriptor_shape(width, height, n_frames, cell_size, nbins):
    """Calculate the total shape of the descriptor."""

    cells_per_width = width // cell_size[0]
    cells_per_height = height // cell_size[1]
    cells_per_depth = n_frames // cell_size[2]

    return cells_per_width, cells_per_height, cells_per_depth, nbins, 2

def compute_hog3d_rgb(frames, cell_size, nbins, v, gaussian_filter):
    """Compute the HOG3D features for a video."""

    width, height, channels, n_frames = frames.shape
    gray_frames = tl.tenalg.mode_dot(frames, np.array([0.2989, 0.5870, 0.1140]), mode=2)

    if gaussian_filter:
        print('Applying Gaussian Filter')
        gaussian_filter = np.array([[1, 4, 7, 10, 7, 4, 1],
                                [4, 12, 26, 33, 26, 12, 4],
                                [7, 26, 55, 71, 55, 26, 7],
                                [10, 33, 71, 91, 71, 33, 10],
                                [7, 26, 55, 71, 55, 26, 7],
                                [4, 12, 26, 33, 26, 12, 4],
                                [1, 4, 7, 10, 7, 4, 1]]) / 1115



        data = np.zeros_like(gray_frames)
        for i in range(n_frames):
            data[:, :, i] = convolve2d(gray_frames[:, :, i], gaussian_filter, mode='same')
    else:
        data = gray_frames
        del gray_frames
        gc.collect()

    gx = np.gradient(data, axis=0)
    gy = np.gradient(data, axis=1)
    gz = np.gradient(data, axis=2)

    magnitude = np.sqrt(gx**2 + gy**2 + gz**2)
    azimuthal_angle = np.arctan2(gy, gx)
    polar_angle = np.arctan2(np.sqrt(gx**2, gy**2), gz)

    output_shape = calculate_total_descriptor_shape(width, height, n_frames, cell_size, nbins)
    hog3d_descriptors = np.zeros(output_shape, dtype=np.float32)

    for i in range(0, n_frames - cell_size[2], cell_size[2]):
        for y in range(0, height - cell_size[1], cell_size[1]):
            for x in range(0, width - cell_size[0], cell_size[0]):
                cell_magnitude = magnitude[x:x + cell_size[0], y:y + cell_size[1], i:i + cell_size[2]]
                cell_azimuthal = azimuthal_angle[x:x + cell_size[0], y:y + cell_size[1], i:i + cell_size[2]]
                cell_polar = polar_angle[x:x + cell_size[0], y:y + cell_size[1], i:i + cell_size[2]]

                hist_azimuthal, _ = np.histogram(cell_azimuthal, bins=nbins, range=(-np.pi, np.pi), weights=cell_magnitude)
                hist_polar, _ = np.histogram(cell_polar, bins=nbins, range=(0, np.pi), weights=cell_magnitude)

                hist_azimuthal = hist_azimuthal / (np.linalg.norm(hist_azimuthal) + 1e-6)
                hist_polar = hist_polar / (np.linalg.norm(hist_polar) + 1e-6)

                hog3d_descriptors[x//cell_size[0], y//cell_size[1], i//cell_size[2], :, 0] = hist_azimuthal
                hog3d_descriptors[x//cell_size[0], y//cell_size[1], i//cell_size[2], :, 1] = hist_polar

    del cell_magnitude, cell_azimuthal, cell_polar, hist_azimuthal, hist_polar, gx, gy, gz, magnitude, azimuthal_angle, polar_angle
    gc.collect()

    return hog3d_descriptors

def process_videos(videos, cell_size=(5, 6, 5), nbins=9, gaussian_filter=False):
    """Process a batch of videos."""

    width, height, channels, n_frames, video_count = videos.shape

    for v in tqdm(range(video_count)):
        hog3d_descriptors = compute_hog3d_rgb(videos[:, :, :, :, v].astype(np.float32), cell_size, nbins, v, gaussian_filter)
        if v == 0:
            descriptors_shape = hog3d_descriptors.shape
            all_descriptors = np.zeros((video_count, *descriptors_shape), dtype=np.float32)

        all_descriptors[v] += hog3d_descriptors

        del hog3d_descriptors
        gc.collect()

    return all_descriptors

def save_to_disk(variable, filename):
    np.save(filename, variable)
    del variable
    gc.collect()
In [5]:
H_train = process_videos(train_videos)
H_test = process_videos(test_videos)

print(H_train.shape)
print(H_test.shape)

hog3d_data_dir = os.path.join(os.getcwd(), 'HOG3d_Data')
os.makedirs(hog3d_data_dir, exist_ok=True)

save_to_disk(H_train, os.path.join(hog3d_data_dir, 'H_train.npy'))
save_to_disk(H_test, os.path.join(hog3d_data_dir, 'H_test.npy'))
  0%|          | 0/960 [00:00<?, ?it/s]
  0%|          | 0/480 [00:00<?, ?it/s]
(960, 48, 71, 10, 9, 2)
(480, 48, 71, 10, 9, 2)

References:¶

[1] SpringerLink (Online service), Panigrahi, C. R., Pati, B., Mohapatra, P., Buyya, R., & Li, K. (2021). Progress in Advanced Computing and Intelligent Engineering: Proceedings of ICACIE 2019, Volume 1 (1st ed. 2021.). Springer Singapore : Imprint: Springer. https://doi.org/10.1007/978-981-15-6584-7

[2] Zoubir, Hajar & Rguig, Mustapha & Aroussi, Mohamed & Chehri, Abdellah & Rachid, Saadane. (2022). Concrete Bridge Crack Image Classification Using Histograms of Oriented Gradients, Uniform Local Binary Patterns, and Kernel Principal Component Analysis. Electronics. 11. 3357. 10.3390/electronics11203357. https://doi.org/10.3390/electronics11203357

[3] S V Shidlovskiy et al 2020 J. Phys.: Conf. Ser. 1611 012072 https://doi:10.1088/1742-6596/1611/1/012072

[4] Nykamp DQ, “Spherical coordinates.” From Math Insight. http://mathinsight.org/spherical_coordinates